{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "title"
   },
   "source": [
    "# Custom training and online prediction\n",
    "\n",
    "## Overview\n",
    "\n",
    "This tutorial demonstrates how to use the Vertex SDK for Python to train and deploy a custom image classification model for online prediction.\n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "In this notebook, you create a custom-trained model from a Python script in a Docker container using the Vertex SDK for Python, and then do a prediction on the deployed model by sending data. Alternatively, you can create custom-trained models using `gcloud` command-line tool, or online using the Cloud Console.\n",
    "\n",
    "- Create a Vertex AI custom job for training a model.\n",
    "- Train a TensorFlow model.\n",
    "- Deploy the `Model` resource to a serving `Endpoint` resource.\n",
    "- Make a prediction.\n",
    "- Undeploy the `Model` resource.\n",
    "\n",
    "\n",
    "## Introduction \n",
    "\n",
    "In this notebook, you create a custom-trained model from a Python script in a Docker container using the Vertex SDK for Python, and then do a prediction on the deployed model by sending data. Alternatively, you can create custom-trained models using `gcloud` command-line tool, or online using the Cloud Console.\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/sdk-custom-image-classification-online.ipynb). \n",
    "\n",
    "### Dataset\n",
    "\n",
    "The dataset used for this tutorial is the [cifar10 dataset](https://www.tensorflow.org/datasets/catalog/cifar10) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). The version of the dataset you will use is built into TensorFlow. The trained model predicts which type of class an image is from ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.\n",
    "\n",
    "**Make sure to enable the Vertex AI API and Compute Engine API.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_aip"
   },
   "source": [
    "## Installation\n",
    "\n",
    "Install the latest (preview) version of Vertex SDK for Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "1fd00fa70a2a"
   },
   "outputs": [],
   "source": [
    "# Setup your dependencies\n",
    "import os\n",
    "\n",
    "# The Google Cloud Notebook product has specific requirements\n",
    "IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists(\"/opt/deeplearning/metadata/env_version\")\n",
    "\n",
    "# Google Cloud Notebook requires dependencies to be installed with '--user'\n",
    "USER_FLAG = \"\"\n",
    "if IS_GOOGLE_CLOUD_NOTEBOOK:\n",
    "    USER_FLAG = \"--user\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "YsxCgt1zlugo"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: google-cloud-aiplatform in /opt/conda/lib/python3.7/site-packages (1.1.1)\n",
      "Collecting google-cloud-aiplatform\n",
      "  Downloading google_cloud_aiplatform-1.3.0-py2.py3-none-any.whl (1.3 MB)\n",
      "\u001b[K     |████████████████████████████████| 1.3 MB 7.7 MB/s eta 0:00:01\n",
      "\u001b[?25hRequirement already satisfied: google-cloud-storage<2.0.0dev,>=1.32.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.41.1)\n",
      "Requirement already satisfied: google-api-core[grpc]<3.0.0dev,>=1.26.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.31.1)\n",
      "Requirement already satisfied: proto-plus>=1.10.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.19.0)\n",
      "Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (21.0)\n",
      "Requirement already satisfied: google-cloud-bigquery<3.0.0dev,>=1.15.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (2.23.2)\n",
      "Requirement already satisfied: six>=1.13.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.16.0)\n",
      "Requirement already satisfied: setuptools>=40.3.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (49.6.0.post20210108)\n",
      "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2.25.1)\n",
      "Requirement already satisfied: google-auth<2.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.34.0)\n",
      "Requirement already satisfied: pytz in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2021.1)\n",
      "Requirement already satisfied: protobuf>=3.12.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (3.16.0)\n",
      "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.53.0)\n",
      "Requirement already satisfied: grpcio<2.0dev,>=1.29.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.38.1)\n",
      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (0.2.7)\n",
      "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (4.2.2)\n",
      "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (4.7.2)\n",
      "Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.3.2)\n",
      "Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.4.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.7.2)\n",
      "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.1.2)\n",
      "Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.14.6)\n",
      "Requirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.20)\n",
      "Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=14.3->google-cloud-aiplatform) (2.4.7)\n",
      "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (0.4.8)\n",
      "Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (4.0.0)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.26.6)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2021.5.30)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2.10)\n",
      "Installing collected packages: google-cloud-aiplatform\n",
      "\u001b[33m  WARNING: The script tb-gcp-uploader is installed in '/home/jupyter/.local/bin' which is not on PATH.\n",
      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\n",
      "Successfully installed google-cloud-aiplatform-1.3.0\n"
     ]
    }
   ],
   "source": [
    "# Upgrade the specified package to the newest available version\n",
    "! pip install {USER_FLAG} --upgrade google-cloud-aiplatform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_storage"
   },
   "source": [
    "Install the latest GA version of *google-cloud-storage* library as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "qssss-KSlugo"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: google-cloud-storage in /opt/conda/lib/python3.7/site-packages (1.41.1)\n",
      "Collecting google-cloud-storage\n",
      "  Downloading google_cloud_storage-1.42.0-py2.py3-none-any.whl (105 kB)\n",
      "\u001b[K     |████████████████████████████████| 105 kB 8.0 MB/s eta 0:00:01\n",
      "\u001b[?25hRequirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (1.34.0)\n",
      "Requirement already satisfied: google-api-core<3.0dev,>=1.29.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (1.31.1)\n",
      "Requirement already satisfied: google-resumable-media<3.0dev,>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (1.3.2)\n",
      "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.25.1)\n",
      "Requirement already satisfied: google-cloud-core<3.0dev,>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (1.7.2)\n",
      "Requirement already satisfied: setuptools>=40.3.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (49.6.0.post20210108)\n",
      "Requirement already satisfied: protobuf>=3.12.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (3.16.0)\n",
      "Requirement already satisfied: pytz in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (2021.1)\n",
      "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (1.53.0)\n",
      "Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (21.0)\n",
      "Requirement already satisfied: six>=1.13.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (1.16.0)\n",
      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (0.2.7)\n",
      "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (4.7.2)\n",
      "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (4.2.2)\n",
      "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage) (1.1.2)\n",
      "Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage) (1.14.6)\n",
      "Requirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=1.3.0->google-cloud-storage) (2.20)\n",
      "Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=14.3->google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (2.4.7)\n",
      "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-cloud-storage) (0.4.8)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2021.5.30)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2.10)\n",
      "Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (4.0.0)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (1.26.6)\n",
      "Installing collected packages: google-cloud-storage\n",
      "Successfully installed google-cloud-storage-1.42.0\n"
     ]
    }
   ],
   "source": [
    "! pip install {USER_FLAG} --upgrade google-cloud-storage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_pillow"
   },
   "source": [
    "Install the *pillow* library for loading images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "vhP4dtWUlugp"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pillow in /opt/conda/lib/python3.7/site-packages (8.3.1)\n"
     ]
    }
   ],
   "source": [
    "! pip install {USER_FLAG} --upgrade pillow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_numpy"
   },
   "source": [
    "Install the *numpy* library for manipulation of image data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "80-_pO4olugp"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (1.19.5)\n",
      "Collecting numpy\n",
      "  Downloading numpy-1.21.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)\n",
      "\u001b[K     |████████████████████████████████| 15.7 MB 6.9 MB/s eta 0:00:01\n",
      "\u001b[?25hInstalling collected packages: numpy\n",
      "\u001b[33m  WARNING: The scripts f2py, f2py3 and f2py3.7 are installed in '/home/jupyter/.local/bin' which is not on PATH.\n",
      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "tensorflow-io 0.18.0 requires tensorflow-io-gcs-filesystem==0.18.0, which is not installed.\n",
      "tfx-bsl 1.2.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.13.0 which is incompatible.\n",
      "tfx-bsl 1.2.0 requires google-api-python-client<2,>=1.7.11, but you have google-api-python-client 2.15.0 which is incompatible.\n",
      "tfx-bsl 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.23.2 which is incompatible.\n",
      "tfx-bsl 1.2.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.2 which is incompatible.\n",
      "tfx-bsl 1.2.0 requires pyarrow<3,>=1, but you have pyarrow 5.0.0 which is incompatible.\n",
      "tensorflow 2.5.0 requires grpcio~=1.34.0, but you have grpcio 1.38.1 which is incompatible.\n",
      "tensorflow 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.\n",
      "tensorflow 2.5.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.\n",
      "tensorflow 2.5.0 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.0 which is incompatible.\n",
      "tensorflow-transform 1.2.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.13.0 which is incompatible.\n",
      "tensorflow-transform 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.23.2 which is incompatible.\n",
      "tensorflow-transform 1.2.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.2 which is incompatible.\n",
      "tensorflow-transform 1.2.0 requires pyarrow<3,>=1, but you have pyarrow 5.0.0 which is incompatible.\n",
      "apache-beam 2.31.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.\n",
      "apache-beam 2.31.0 requires numpy<1.21.0,>=1.14.3, but you have numpy 1.21.2 which is incompatible.\n",
      "apache-beam 2.31.0 requires pyarrow<5.0.0,>=0.15.1, but you have pyarrow 5.0.0 which is incompatible.\n",
      "apache-beam 2.31.0 requires typing-extensions<3.8.0,>=3.7.0, but you have typing-extensions 3.10.0.0 which is incompatible.\u001b[0m\n",
      "Successfully installed numpy-1.21.2\n"
     ]
    }
   ],
   "source": [
    "! pip install {USER_FLAG} --upgrade numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
   "**Note**: Please ignore any incompatibility warnings and errors.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "restart"
   },
   "source": [
    "### Restart the kernel\n",
    "\n",
    "Once you've installed everything, you need to restart the notebook kernel so it can find the packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "bzPxhxS5lugp"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "if not os.getenv(\"IS_TESTING\"):\n",
    "    # Automatically restart kernel after installs\n",
    "    import IPython\n",
    "\n",
    "    app = IPython.Application.instance()\n",
    "    app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "project_id"
   },
   "source": [
    "#### Set your project ID\n",
    "\n",
    "Update `YOUR-PROJECT-ID` with your Project ID. **If you don't know your project ID**, you may be able to get your project ID using `gcloud`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "autoset_project_id"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Project ID:  qwiklabs-gcp-04-6c865137f72a\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "PROJECT_ID = \"YOUR-PROJECT-ID\"\n",
    "\n",
    "if not os.getenv(\"IS_TESTING\"):\n",
    "    # Get your Google Cloud project ID from gcloud\n",
    "    shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null\n",
    "    PROJECT_ID = shell_output[0]\n",
    "    print(\"Project ID: \", PROJECT_ID)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "set_project_id"
   },
   "source": [
    "Otherwise, set your project ID here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "USd_pUT0lugr"
   },
   "outputs": [],
   "source": [
    "if PROJECT_ID == \"\" or PROJECT_ID is None:\n",
    "    PROJECT_ID = \"YOUR-PROJECT-ID\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "timestamp"
   },
   "source": [
    "#### Timestamp\n",
    "\n",
    "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "c-pX32xalugs"
   },
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "from datetime import datetime\n",
    "\n",
    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "vF60K5v1lugs"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "# If you are running this notebook in Colab, run this cell and follow the\n",
    "# instructions to authenticate your GCP account. This provides access to your\n",
    "# Cloud Storage bucket and lets you submit training jobs and prediction\n",
    "# requests.\n",
    "\n",
    "# The Google Cloud Notebook product has specific requirements\n",
    "IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists(\"/opt/deeplearning/metadata/env_version\")\n",
    "\n",
    "# If on Google Cloud Notebooks, then don't execute this code\n",
    "if not IS_GOOGLE_CLOUD_NOTEBOOK:\n",
    "    if \"google.colab\" in sys.modules:\n",
    "        from google.colab import auth as google_auth\n",
    "\n",
    "        google_auth.authenticate_user()\n",
    "\n",
    "    # If you are running this notebook locally, replace the string below with the\n",
    "    # path to your service account key and run this cell to authenticate your GCP\n",
    "    # account.\n",
    "    elif not os.getenv(\"IS_TESTING\"):\n",
    "        %env GOOGLE_APPLICATION_CREDENTIALS ''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bucket:custom"
   },
   "source": [
    "### Create a Cloud Storage bucket\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "When you submit a training job using the Cloud SDK, you upload a Python package\n",
    "containing your training code to a Cloud Storage bucket. Vertex AI runs\n",
    "the code from this package. In this tutorial, Vertex AI also saves the\n",
    "trained model that results from your job in the same bucket. Using this model artifact, you can then\n",
    "create Vertex AI model and endpoint resources in order to serve\n",
    "online predictions.\n",
    "\n",
    "Set the name of your Cloud Storage bucket below. It must be unique across all\n",
    "Cloud Storage buckets.\n",
    "\n",
    "You may also change the `REGION` variable, which is used for operations\n",
    "throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are\n",
    "available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may\n",
    "not use a Multi-Regional Storage bucket for training with Vertex AI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "bucket"
   },
   "outputs": [],
   "source": [
    "BUCKET_NAME = \"gs://YOUR-BUCKET-NAME\"  # @param {type:\"string\"}\n",
    "REGION = \"YOUR-REGION\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "autoset_bucket"
   },
   "outputs": [],
   "source": [
    "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"gs://[your-bucket-name]\":\n",
    "    BUCKET_NAME = \"gs://\" + PROJECT_ID + \"aip-\" + TIMESTAMP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "create_bucket"
   },
   "source": [
    "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "Oz8J0vmSlugt"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating gs://qwiklabs-gcp-04-6c865137f72a/...\n"
     ]
    }
   ],
   "source": [
    "! gcloud storage buckets create --location=$REGION $BUCKET_NAME"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "validate_bucket"
   },
   "source": [
    "Finally, validate access to your Cloud Storage bucket by examining its contents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "oadE10x2lugu"
   },
   "outputs": [],
   "source": [
    "! gcloud storage ls --all-versions --long $BUCKET_NAME"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "setup_vars"
   },
   "source": [
    "### Set up variables\n",
    "\n",
    "Next, set up some variables used throughout the tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "import_aip"
   },
   "source": [
    "#### Import Vertex SDK for Python\n",
    "\n",
    "Import the Vertex SDK for Python into your Python environment and initialize it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "cNEiwLd0lugu"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "from google.cloud import aiplatform\n",
    "from google.cloud.aiplatform import gapic as aip\n",
    "\n",
    "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "accelerators:training,prediction"
   },
   "source": [
    "#### Set hardware accelerators\n",
    "\n",
    "You can set hardware accelerators for both training and prediction.\n",
    "\n",
    "Set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Tesla K80 GPUs allocated to each VM, you would specify:\n",
    "\n",
    "    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)\n",
    "\n",
    "See the [locations where accelerators are available](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators).\n",
    "\n",
    "Otherwise specify `(None, None)` to use a container image to run on a CPU.\n",
    "\n",
    "*Note*: TensorFlow releases earlier than 2.3 for GPU support fail to load the custom model in this tutorial. This issue is caused by static graph operations that are generated in the serving function. This is a known issue, which is fixed in TensorFlow 2.3. If you encounter this issue with your own custom models, use a container image for TensorFlow 2.3 or later with GPU support."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "xd5PLXDTlugv"
   },
   "outputs": [],
   "source": [
    "TRAIN_GPU, TRAIN_NGPU = (aip.AcceleratorType.NVIDIA_TESLA_K80, 1)\n",
    "\n",
    "DEPLOY_GPU, DEPLOY_NGPU = (aip.AcceleratorType.NVIDIA_TESLA_K80, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "container:training,prediction"
   },
   "source": [
    "#### Set pre-built containers\n",
    "\n",
    "Vertex AI provides pre-built containers to run training and prediction.\n",
    "\n",
    "For the latest list, see [Pre-built containers for training](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) and [Pre-built containers for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "1u1mr18jlugv"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training: gcr.io/cloud-aiplatform/training/tf-gpu.2-1:latest AcceleratorType.NVIDIA_TESLA_K80 1\n",
      "Deployment: gcr.io/cloud-aiplatform/prediction/tf2-gpu.2-1:latest AcceleratorType.NVIDIA_TESLA_K80 1\n"
     ]
    }
   ],
   "source": [
    "TRAIN_VERSION = \"tf-gpu.2-1\"\n",
    "DEPLOY_VERSION = \"tf2-gpu.2-1\"\n",
    "\n",
    "TRAIN_IMAGE = \"gcr.io/cloud-aiplatform/training/{}:latest\".format(TRAIN_VERSION)\n",
    "DEPLOY_IMAGE = \"gcr.io/cloud-aiplatform/prediction/{}:latest\".format(DEPLOY_VERSION)\n",
    "\n",
    "print(\"Training:\", TRAIN_IMAGE, TRAIN_GPU, TRAIN_NGPU)\n",
    "print(\"Deployment:\", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "machine:training,prediction"
   },
   "source": [
    "#### Set machine types\n",
    "\n",
    "Next, set the machine types to use for training and prediction.\n",
    "\n",
    "- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure your compute resources for training and prediction.\n",
    " - `machine type`\n",
    "     - `n1-standard`: 3.75GB of memory per vCPU\n",
    "     - `n1-highmem`: 6.5GB of memory per vCPU\n",
    "     - `n1-highcpu`: 0.9 GB of memory per vCPU\n",
    " - `vCPUs`: number of \\[2, 4, 8, 16, 32, 64, 96 \\]\n",
    "\n",
    "*Note: The following is not supported for training:*\n",
    "\n",
    " - `standard`: 2 vCPUs\n",
    " - `highcpu`: 2, 4 and 8 vCPUs\n",
    "\n",
    "*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "YAXwbqKKlugv"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train machine type n1-standard-4\n",
      "Deploy machine type n1-standard-4\n"
     ]
    }
   ],
   "source": [
    "MACHINE_TYPE = \"n1-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "TRAIN_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Train machine type\", TRAIN_COMPUTE)\n",
    "\n",
    "MACHINE_TYPE = \"n1-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "DEPLOY_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Deploy machine type\", DEPLOY_COMPUTE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tutorial_start:custom"
   },
   "source": [
    "# Tutorial\n",
    "\n",
    "Now you are ready to start creating your own custom-trained model with CIFAR10."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "train_custom_model"
   },
   "source": [
    "## Train a model\n",
    "\n",
    "There are two ways you can train a custom model using a container image:\n",
    "\n",
    "- **Use a Google Cloud prebuilt container**. If you use a prebuilt container, you will additionally specify a Python package to install into the container image. This Python package contains your code for training a custom model.\n",
    "\n",
    "- **Use your own custom container image**. If you use your own container, the container needs to contain your code for training a custom model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "train_custom_job_args"
   },
   "source": [
    "### Define the command args for the training script\n",
    "\n",
    "Prepare the command-line arguments to pass to your training script.\n",
    "- `args`: The command line arguments to pass to the corresponding Python module. In this example, they will be:\n",
    "  - `\"--epochs=\" + EPOCHS`: The number of epochs for training.\n",
    "  - `\"--steps=\" + STEPS`: The number of steps (batches) per epoch.\n",
    "  - `\"--distribute=\" + TRAIN_STRATEGY\"` : The training distribution strategy to use for single or distributed training.\n",
    "     - `\"single\"`: single device.\n",
    "     - `\"mirror\"`: all GPU devices on a single compute instance.\n",
    "     - `\"multi\"`: all GPU devices on all compute instances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "1npiDcUtlugw"
   },
   "outputs": [],
   "source": [
    "# TODO 1\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "taskpy_contents"
   },
   "source": [
    "#### Training script\n",
    "\n",
    "In the next cell, you will write the contents of the training script, `task.py`. In summary:\n",
    "\n",
    "- Get the directory where to save the model artifacts from the environment variable `AIP_MODEL_DIR`. This variable is set by the training service.\n",
    "- Loads CIFAR10 dataset from TF Datasets (tfds).\n",
    "- Builds a model using TF.Keras model API.\n",
    "- Compiles the model (`compile()`).\n",
    "- Sets a training distribution strategy according to the argument `args.distribute`.\n",
    "- Trains the model (`fit()`) with epochs and steps according to the arguments `args.epochs` and `args.steps`\n",
    "- Saves the trained model (`save(MODEL_DIR)`) to the specified model directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "72rUqXNFlugx"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing task.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile task.py\n",
    "# Single, Mirror and Multi-Machine Distributed Training for CIFAR-10\n",
    "\n",
    "import tensorflow_datasets as tfds\n",
    "import tensorflow as tf\n",
    "from tensorflow.python.client import device_lib\n",
    "import argparse\n",
    "import os\n",
    "import sys\n",
    "tfds.disable_progress_bar()\n",
    "\n",
    "parser = argparse.ArgumentParser()\n",
    "parser.add_argument('--lr', dest='lr',\n",
    "                    default=0.01, type=float,\n",
    "                    help='Learning rate.')\n",
    "parser.add_argument('--epochs', dest='epochs',\n",
    "                    default=10, type=int,\n",
    "                    help='Number of epochs.')\n",
    "parser.add_argument('--steps', dest='steps',\n",
    "                    default=200, type=int,\n",
    "                    help='Number of steps per epoch.')\n",
    "parser.add_argument('--distribute', dest='distribute', type=str, default='single',\n",
    "                    help='distributed training strategy')\n",
    "args = parser.parse_args()\n",
    "\n",
    "print('Python Version = {}'.format(sys.version))\n",
    "print('TensorFlow Version = {}'.format(tf.__version__))\n",
    "print('TF_CONFIG = {}'.format(os.environ.get('TF_CONFIG', 'Not found')))\n",
    "print('DEVICES', device_lib.list_local_devices())\n",
    "\n",
    "# Single Machine, single compute device\n",
    "if args.distribute == 'single':\n",
    "    if tf.test.is_gpu_available():\n",
    "        strategy = tf.distribute.OneDeviceStrategy(device=\"/gpu:0\")\n",
    "    else:\n",
    "        strategy = tf.distribute.OneDeviceStrategy(device=\"/cpu:0\")\n",
    "# Single Machine, multiple compute device\n",
    "elif args.distribute == 'mirror':\n",
    "    strategy = tf.distribute.MirroredStrategy()\n",
    "# Multiple Machine, multiple compute device\n",
    "elif args.distribute == 'multi':\n",
    "    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()\n",
    "\n",
    "# Multi-worker configuration\n",
    "print('num_replicas_in_sync = {}'.format(strategy.num_replicas_in_sync))\n",
    "\n",
    "# Preparing dataset\n",
    "BUFFER_SIZE = 10000\n",
    "BATCH_SIZE = 64\n",
    "\n",
    "def make_datasets_unbatched():\n",
    "  # Scaling CIFAR10 data from (0, 255] to (0., 1.]\n",
    "  def scale(image, label):\n",
    "    image = tf.cast(image, tf.float32)\n",
    "    image /= 255.0\n",
    "    return image, label\n",
    "\n",
    "  datasets, info = tfds.load(name='cifar10',\n",
    "                            with_info=True,\n",
    "                            as_supervised=True)\n",
    "  return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE).repeat()\n",
    "\n",
    "\n",
    "# Build the Keras model\n",
    "def build_and_compile_cnn_model():\n",
    "  model = tf.keras.Sequential([\n",
    "      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(32, 32, 3)),\n",
    "      tf.keras.layers.MaxPooling2D(),\n",
    "      tf.keras.layers.Conv2D(32, 3, activation='relu'),\n",
    "      tf.keras.layers.MaxPooling2D(),\n",
    "      tf.keras.layers.Flatten(),\n",
    "      tf.keras.layers.Dense(10, activation='softmax')\n",
    "  ])\n",
    "  model.compile(\n",
    "      loss=tf.keras.losses.sparse_categorical_crossentropy,\n",
    "      optimizer=tf.keras.optimizers.SGD(learning_rate=args.lr),\n",
    "      metrics=['accuracy'])\n",
    "  return model\n",
    "\n",
    "# Train the model\n",
    "NUM_WORKERS = strategy.num_replicas_in_sync\n",
    "# Here the batch size scales up by number of workers since\n",
    "# `tf.data.Dataset.batch` expects the global batch size.\n",
    "GLOBAL_BATCH_SIZE = BATCH_SIZE * NUM_WORKERS\n",
    "MODEL_DIR = os.getenv(\"AIP_MODEL_DIR\")\n",
    "\n",
    "train_dataset = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)\n",
    "\n",
    "with strategy.scope():\n",
    "  # Creation of dataset, and model building/compiling need to be within\n",
    "  # `strategy.scope()`.\n",
    "  model = build_and_compile_cnn_model()\n",
    "\n",
    "model.fit(x=train_dataset, epochs=args.epochs, steps_per_epoch=args.steps)\n",
    "model.save(MODEL_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "train_custom_job"
   },
   "source": [
    "### Train the model\n",
    "\n",
    "Define your custom training job on Vertex AI.\n",
    "\n",
    "Use the `CustomTrainingJob` class to define the job, which takes the following parameters:\n",
    "\n",
    "- `display_name`: The user-defined name of this training pipeline.\n",
    "- `script_path`: The local path to the training script.\n",
    "- `container_uri`: The URI of the training container image.\n",
    "- `requirements`: The list of Python package dependencies of the script.\n",
    "- `model_serving_container_image_uri`: The URI of a container that can serve predictions for your model — either a prebuilt container or a custom container.\n",
    "\n",
    "Use the `run` function to start training, which takes the following parameters:\n",
    "\n",
    "- `args`: The command line arguments to be passed to the Python script.\n",
    "- `replica_count`: The number of worker replicas.\n",
    "- `model_display_name`: The display name of the `Model` if the script produces a managed `Model`.\n",
    "- `machine_type`: The type of machine to use for training.\n",
    "- `accelerator_type`: The hardware accelerator type.\n",
    "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n",
    "\n",
    "The `run` function creates a training pipeline that trains and creates a `Model` object. After the training pipeline completes, the `run` function returns the `Model` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "mxIxvDdglugx"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.utils.source_utils:Training script copied to:\n",
      "gs://qwiklabs-gcp-04-6c865137f72a/aiplatform-2021-08-26-14:26:46.974-aiplatform_custom_trainer_script-0.1.tar.gz.\n",
      "INFO:google.cloud.aiplatform.training_jobs:Training Output directory:\n",
      "gs://qwiklabs-gcp-04-6c865137f72a/aiplatform-custom-training-2021-08-26-14:26:47.099 \n",
      "INFO:google.cloud.aiplatform.training_jobs:View Training:\n",
      "https://console.cloud.google.com/ai/platform/locations/us-central1/training/7773250340236820480?project=21844552219\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_PENDING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_PENDING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_PENDING\n",
      "INFO:google.cloud.aiplatform.training_jobs:View backing custom job:\n",
      "https://console.cloud.google.com/ai/platform/locations/us-central1/training/7937948385984643072?project=21844552219\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob run completed. Resource name: projects/21844552219/locations/us-central1/trainingPipelines/7773250340236820480\n",
      "INFO:google.cloud.aiplatform.training_jobs:Model available at projects/21844552219/locations/us-central1/models/6908020451084075008\n"
     ]
    }
   ],
   "source": [
    "job = aiplatform.CustomTrainingJob(\n",
    "    display_name=JOB_NAME,\n",
    "    script_path=\"task.py\",\n",
    "    container_uri=TRAIN_IMAGE,\n",
    "    requirements=[\"tensorflow_datasets==1.3.0\"],\n",
    "    model_serving_container_image_uri=DEPLOY_IMAGE,\n",
    ")\n",
    "\n",
    "MODEL_DISPLAY_NAME = \"cifar10-\" + TIMESTAMP\n",
    "\n",
    "# TODO 2\n",
    "\n"
    ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "deploy_model:dedicated"
   },
   "source": [
    "### Deploy the model\n",
    "\n",
    "Before you use your model to make predictions, you need to deploy it to an `Endpoint`. You can do this by calling the `deploy` function on the `Model` resource. This will do two things:\n",
    "\n",
    "1. Create an `Endpoint` resource for deploying the `Model` resource to.\n",
    "2. Deploy the `Model` resource to the `Endpoint` resource.\n",
    "\n",
    "\n",
    "The function takes the following parameters:\n",
    "\n",
    "- `deployed_model_display_name`: A human readable name for the deployed model.\n",
    "- `traffic_split`: Percent of traffic at the endpoint that goes to this model, which is specified as a dictionary of one or more key/value pairs.\n",
    "   - If only one model, then specify as **{ \"0\": 100 }**, where \"0\" refers to this model being uploaded and 100 means 100% of the traffic.\n",
    "   - If there are existing models on the endpoint, for which the traffic will be split, then use `model_id` to specify as **{ \"0\": percent, model_id: percent, ... }**, where `model_id` is the model id of an existing model to the deployed endpoint. The percents must add up to 100.\n",
    "- `machine_type`: The type of machine to use for training.\n",
    "- `accelerator_type`: The hardware accelerator type.\n",
    "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n",
    "- `starting_replica_count`: The number of compute instances to initially provision.\n",
    "- `max_replica_count`: The maximum number of compute instances to scale to. In this tutorial, only one instance is provisioned.\n",
    "\n",
    "### Traffic split\n",
    "\n",
    "The `traffic_split` parameter is specified as a Python dictionary. You can deploy more than one instance of your model to an endpoint, and then set the percentage of traffic that goes to each instance.\n",
    "\n",
    "You can use a traffic split to introduce a new model gradually into production. For example, if you had one existing model in production with 100% of the traffic, you could deploy a new model to the same endpoint, direct 10% of traffic to it, and reduce the original model's traffic to 90%. This allows you to monitor the new model's performance while minimizing the disruption to the majority of users.\n",
    "\n",
    "### Compute instance scaling\n",
    "\n",
    "You can specify a single instance (or node) to serve your online prediction requests. This tutorial uses a single node, so the variables `MIN_NODES` and `MAX_NODES` are both set to `1`.\n",
    "\n",
    "If you want to use multiple nodes to serve your online prediction requests, set `MAX_NODES` to the maximum number of nodes you want to use. Vertex AI autoscales the number of nodes used to serve your predictions, up to the maximum number you set. Refer to the [pricing page](https://cloud.google.com/vertex-ai/pricing#prediction-prices) to understand the costs of autoscaling with multiple nodes.\n",
    "\n",
    "### Endpoint\n",
    "\n",
    "The method will block until the model is deployed and eventually return an `Endpoint` object. If this is the first time a model is deployed to the endpoint, it may take a few additional minutes to complete provisioning of resources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "WMH7GrYMlugy"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.models:Creating Endpoint\n",
      "INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/21844552219/locations/us-central1/endpoints/3755562284575883264/operations/1226649531685273600\n",
      "INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/21844552219/locations/us-central1/endpoints/3755562284575883264\n",
      "INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:\n",
      "INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/21844552219/locations/us-central1/endpoints/3755562284575883264')\n",
      "INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/21844552219/locations/us-central1/endpoints/3755562284575883264\n",
      "INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/21844552219/locations/us-central1/endpoints/3755562284575883264/operations/7369559423418630144\n",
      "INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/21844552219/locations/us-central1/endpoints/3755562284575883264\n"
     ]
    }
   ],
   "source": [
     "# TODO 3\n",
     "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "make_prediction"
   },
   "source": [
    "## Make an online prediction request\n",
    "\n",
    "Send an online prediction request to your deployed model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "get_test_item:test"
   },
   "source": [
    "### Get test data\n",
    "\n",
    "Download images from the CIFAR dataset and preprocess them.\n",
    "\n",
    "#### Download the test images\n",
    "\n",
    "Download the provided set of images from the CIFAR dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "E1EQBPGnlugz"
   },
   "outputs": [],
   "source": [
    "# Download the images\n",
    "! gcloud storage cp --recursive gs://cloud-samples-data/ai-platform-unified/cifar_test_images ."   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "prepare_test_item:test,image"
   },
   "source": [
    "#### Preprocess the images\n",
    "Before you can run the data through the endpoint, you need to preprocess it to match the format that your custom model defined in `task.py` expects.\n",
    "\n",
    "`x_test`:\n",
    "Normalize (rescale) the pixel data by dividing each pixel by 255. This replaces each single byte integer pixel with a 32-bit floating point number between 0 and 1.\n",
    "\n",
    "`y_test`:\n",
    "You can extract the labels from the image filenames. Each image's filename format is \"image_{LABEL}_{IMAGE_NUMBER}.jpg\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "cl59KGnXlugz"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "# Load image data\n",
    "IMAGE_DIRECTORY = \"cifar_test_images\"\n",
    "\n",
    "image_files = [file for file in os.listdir(IMAGE_DIRECTORY) if file.endswith(\".jpg\")]\n",
    "\n",
    "# Decode JPEG images into numpy arrays\n",
    "image_data = [\n",
    "    np.asarray(Image.open(os.path.join(IMAGE_DIRECTORY, file))) for file in image_files\n",
    "]\n",
    "\n",
    "# Scale and convert to expected format\n",
    "x_test = [(image / 255.0).astype(np.float32).tolist() for image in image_data]\n",
    "\n",
    "# Extract labels from image name\n",
    "y_test = [int(file.split(\"_\")[1]) for file in image_files]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "send_prediction_request:image"
   },
   "source": [
    "### Send the prediction request\n",
    "\n",
    "Now that you have test images, you can use them to send a prediction request. Use the `Endpoint` object's `predict` function, which takes the following parameters:\n",
    "\n",
    "- `instances`: A list of image instances. According to your custom model, each image instance should be a 3-dimensional matrix of floats. This was prepared in the previous step.\n",
    "\n",
    "The `predict` function returns a list, where each element in the list corresponds to the corresponding image in the request. You will see in the output for each prediction:\n",
    "\n",
    "- Confidence level for the prediction (`predictions`), between 0 and 1, for each of the ten classes.\n",
    "\n",
    "You can then run a quick evaluation on the prediction results:\n",
    "1. `np.argmax`: Convert each list of confidence levels to a label\n",
    "2. Compare the predicted labels to the actual labels\n",
    "3. Calculate `accuracy` as `correct/total`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "UywuX7fRlugz"
   },
   "outputs": [],
   "source": [
    "# TODO 4\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "undeploy_model"
   },
   "source": [
    "## Undeploy the model\n",
    "\n",
    "To undeploy your `Model` resource from the serving `Endpoint` resource, use the endpoint's `undeploy` method with the following parameter:\n",
    "\n",
    "- `deployed_model_id`: The model deployment identifier returned by the endpoint service when the `Model` resource was deployed. You can retrieve the deployed models using the endpoint's `deployed_models` property.\n",
    "\n",
    "Since this is the only deployed model on the `Endpoint` resource, you can omit `traffic_split`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "khPSAO1tlug0"
   },
   "outputs": [],
   "source": [
    "# TODO 5\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cleanup:custom"
   },
   "source": [
    "# Cleaning up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
    "\n",
    "Otherwise, you can delete the individual resources you created in this tutorial:\n",
    "\n",
    "- Training Job\n",
    "- Model\n",
    "- Endpoint\n",
    "- Cloud Storage Bucket"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NNmebHf7lug0"
   },
   "outputs": [],
   "source": [
    "delete_training_job = True\n",
    "delete_model = True\n",
    "delete_endpoint = True\n",
    "\n",
    "# Warning: Setting this to true will delete everything in your bucket\n",
    "delete_bucket = False\n",
    "\n",
    "# Delete the training job\n",
    "job.delete()\n",
    "\n",
    "# Delete the model\n",
    "model.delete()\n",
    "\n",
    "# Delete the endpoint\n",
    "endpoint.delete()\n",
    "\n",
    "if delete_bucket and \"BUCKET_NAME\" in globals():\n",
    "    ! gcloud storage rm --recursive $BUCKET_NAME"   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "sdk-custom-image-classification-online.ipynb",
   "toc_visible": true
  },
  "environment": {
   "name": "tf2-gpu.2-5.m76",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-5:m76"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
